A Weighted Overlap Add-based Front-end for Speech Recognition
نویسندگان
چکیده
Speech signal enhancement is frequently referred to as a preprocessing step to speech recognition. However, in practice, this cannot be easily accomplished since the front-end signal processing techniques and/or parameters used in these two frequently differ. We apply a signal processing technique successfully used in speech enhancement to speech recognition and show that it can perform equally well compared to well-known speech recognition front-ends such as MFCC. The technique, oversampled filterbank analysis/synthesis through weighted overlap add (WOLA), has been tested and performed satisfactorily on the TI-46 and Aurora tasks in both clean and noisy conditions and also in subband speech recognition. The results indicate the capability of this technique in reducing the front-end signal processing blocks of enhancement and recognition into a single block.
منابع مشابه
A low-resource, miniature implementation of the ETSI distributed speech recognition front-end
The purpose of this work is to demonstrate that distributed speech recognition front-ends can be deployed in environments which provide for very little power and CPU resources, with possibly no degradation of speech recognition quality when compared to standard floatingpoint implementations. The ETSI distributed speech recognition front-end standard is implemented on an ultra low-power miniatur...
متن کاملA Low-resource, Miniature of the Etsi Distributed Speech R
The purpose of this work is to demonstrate that distributed speech recognition front-ends can be deployed in environments which providefor very little power and CPU resources, with possibly no degradation of speech recognition quality when compared to standard floatingpoint implementations. The ETSI distributed speech recognition front-end standard is implemented on an ultra low-power miniature...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملA Noise-Robust ASR Back-end Techniqu Recognition
The performance of speech recognition systems trained in quiet degrades significantly under noisy conditions. To address this problem, a Weighted Viterbi Recognition (WVR) algorithm that is a function of the SNR of each speech frame is proposed. Acoustic models trained on clean data, and the acoustic front-end features are kept unchanged in this approach. Instead, a confidence/robustness factor...
متن کاملInterpolate to Enhance for NonStationary Signal Processing
Enhancing non-stationary signals is crucial for many applications, such as speech recognition, audio communication, and bio-signals analysis. The present paper investigates a novel processing structure (alternative to the overlap-add scheme), based on an interpolated zero-phase FIR filtering. The proposed structure accounts for slow signal non-stationarity, and also natively supports time and f...
متن کامل